读取从 UN Comtrade Database 获取的 2019 年全球铜及其精炼物(2603)贸易记录,并处理:
!注:数据中上报者(Reporter)可能包含国家和地区以及中国港澳台地区,为方便起见以下统称“国家”.
from src.main import *
data_overview()
| Reporter Code | Reporter | Partner Code | Partner | Trade Flow | Trade Value (US$) | |
|---|---|---|---|---|---|---|
| 2 | 31 | Azerbaijan | 702 | Singapore | Export | 419159 |
| 3 | 31 | Azerbaijan | 757 | Switzerland | Export | 3043343 |
| 4 | 31 | Azerbaijan | 826 | United Kingdom | Import | 79 |
| 5 | 31 | Azerbaijan | 860 | Uzbekistan | Export | 17248944 |
| 6 | 31 | Azerbaijan | 899 | Areas, nes | Import | 1007 |
| ... | ... | ... | ... | ... | ... | ... |
| 1146 | 682 | Saudi Arabia | 699 | India | Export | 119513031 |
| 1147 | 682 | Saudi Arabia | 784 | United Arab Emirates | Import | 133847 |
| 1148 | 682 | Saudi Arabia | 842 | USA | Import | 678083 |
| 1150 | 804 | Ukraine | 156 | China | Import | 3199 |
| 1152 | 818 | Egypt | 682 | Saudi Arabia | Export | 1039 |
998 rows × 6 columns
check_data()
上报进贸易记录的国家总数(不含重复): 89 上报进贸易记录的中的贸易对象国家总数(不含重复): 109 上报进贸易记录的国家,和记录中的贸易对象国家总数(不含重复): 119
从上面数据可以看出,各国上报的记录显然是有出入的,即上报记录中涉及的国家数要比上报数据的国家数多,这至少说明了一些参与了贸易的国家没有上报贸易记录
下面以中国上报的记录为例检查和说明:
1. 查看中国上报的进口记录
view_logs_by_china()
中国上报的进口记录中,涉及出口国家的个数: 56
| Reporter Code | Reporter | Partner Code | Partner | Trade Flow | Trade Value (US$) | |
|---|---|---|---|---|---|---|
| 247 | 156 | China | 8 | Albania | Import | 17013004 |
| 248 | 156 | China | 36 | Australia | Import | 1666113958 |
| 250 | 156 | China | 51 | Armenia | Import | 510563982 |
| 251 | 156 | China | 68 | Bolivia (Plurinational State of) | Import | 2677980 |
| 252 | 156 | China | 76 | Brazil | Import | 557705819 |
| 255 | 156 | China | 100 | Bulgaria | Import | 50 |
| 256 | 156 | China | 104 | Myanmar | Import | 2850825 |
| 258 | 156 | China | 124 | Canada | Import | 910149220 |
| 260 | 156 | China | 152 | Chile | Import | 12076025483 |
| 261 | 156 | China | 170 | Colombia | Import | 56382455 |
| 262 | 156 | China | 178 | Congo | Import | 520238 |
| 263 | 156 | China | 180 | Dem. Rep. of the Congo | Import | 386995691 |
| 265 | 156 | China | 212 | Dominica | Import | 57 |
| 266 | 156 | China | 214 | Dominican Rep. | Import | 123783836 |
| 267 | 156 | China | 218 | Ecuador | Import | 4952430 |
| 268 | 156 | China | 231 | Ethiopia | Import | 10326 |
| 269 | 156 | China | 232 | Eritrea | Import | 105319016 |
| 271 | 156 | China | 360 | Indonesia | Import | 618533133 |
| 272 | 156 | China | 364 | Iran | Import | 23701028 |
| 275 | 156 | China | 398 | Kazakhstan | Import | 964109620 |
| 276 | 156 | China | 404 | Kenya | Import | 501479 |
| 278 | 156 | China | 410 | Rep. of Korea | Import | 709 |
| 279 | 156 | China | 417 | Kyrgyzstan | Import | 199725 |
| 280 | 156 | China | 418 | Lao People's Dem. Rep. | Import | 458829040 |
| 281 | 156 | China | 450 | Madagascar | Import | 440772 |
| 282 | 156 | China | 458 | Malaysia | Import | 34683639 |
| 283 | 156 | China | 478 | Mauritania | Import | 209832472 |
| 284 | 156 | China | 484 | Mexico | Import | 1986012762 |
| 285 | 156 | China | 490 | Other Asia, nes | Import | 111136358 |
| 286 | 156 | China | 496 | Mongolia | Import | 1795514292 |
| 287 | 156 | China | 504 | Morocco | Import | 17746697 |
| 288 | 156 | China | 508 | Mozambique | Import | 2099031 |
| 289 | 156 | China | 516 | Namibia | Import | 18079 |
| 290 | 156 | China | 566 | Nigeria | Import | 9853 |
| 292 | 156 | China | 586 | Pakistan | Import | 193330 |
| 293 | 156 | China | 591 | Panama | Import | 387665029 |
| 294 | 156 | China | 598 | Papua New Guinea | Import | 159701250 |
| 295 | 156 | China | 604 | Peru | Import | 9052329871 |
| 296 | 156 | China | 608 | Philippines | Import | 268285022 |
| 297 | 156 | China | 642 | Romania | Import | 12115525 |
| 298 | 156 | China | 643 | Russian Federation | Import | 234860468 |
| 299 | 156 | China | 682 | Saudi Arabia | Import | 152889096 |
| 300 | 156 | China | 688 | Serbia | Import | 111 |
| 301 | 156 | China | 699 | India | Import | 78282781 |
| 302 | 156 | China | 704 | Viet Nam | Import | 22101901 |
| 303 | 156 | China | 706 | Somalia | Import | 154153 |
| 304 | 156 | China | 710 | South Africa | Import | 17009652 |
| 305 | 156 | China | 716 | Zimbabwe | Import | 783889 |
| 306 | 156 | China | 724 | Spain | Import | 872750734 |
| 307 | 156 | China | 764 | Thailand | Import | 115337 |
| 308 | 156 | China | 784 | United Arab Emirates | Import | 70360785 |
| 309 | 156 | China | 792 | Turkey | Import | 60658162 |
| 310 | 156 | China | 826 | United Kingdom | Import | 74307 |
| 311 | 156 | China | 834 | United Rep. of Tanzania | Import | 8523190 |
| 312 | 156 | China | 842 | USA | Import | 13566717 |
| 313 | 156 | China | 894 | Zambia | Import | 23748142 |
2. 查看中国进口对象上报的对中国的出口记录
view_logs_about_china()
全球上报了对中国有出口记录的国家 43
| Reporter Code | Reporter | Partner Code | Partner | Trade Flow | Trade Value (US$) | |
|---|---|---|---|---|---|---|
| 13 | 36 | Australia | 156 | China | Export | 1585564428 |
| 36 | 51 | Armenia | 156 | China | Export | 174415305 |
| 68 | 68 | Bolivia (Plurinational State of) | 156 | China | Export | 3882938 |
| 88 | 76 | Brazil | 156 | China | Export | 442971407 |
| 116 | 100 | Bulgaria | 156 | China | Export | 71216190 |
| 161 | 124 | Canada | 156 | China | Export | 764979617 |
| 208 | 152 | Chile | 156 | China | Export | 9649325301 |
| 316 | 170 | Colombia | 156 | China | Export | 40352028 |
| 328 | 204 | Benin | 156 | China | Export | 1708 |
| 334 | 218 | Ecuador | 156 | China | Export | 3562481 |
| 386 | 276 | Germany | 156 | China | Export | 7449316 |
| 416 | 344 | China, Hong Kong SAR | 156 | China | Export | 152061 |
| 427 | 360 | Indonesia | 156 | China | Export | 599729496 |
| 465 | 381 | Italy | 156 | China | Export | 35819 |
| 507 | 398 | Kazakhstan | 156 | China | Export | 738608504 |
| 515 | 404 | Kenya | 156 | China | Export | 15282 |
| 526 | 410 | Rep. of Korea | 156 | China | Export | 260380148 |
| 550 | 418 | Lao People's Dem. Rep. | 156 | China | Export | 589052464 |
| 562 | 450 | Madagascar | 156 | China | Export | 511442 |
| 571 | 458 | Malaysia | 156 | China | Export | 182078771 |
| 600 | 490 | Other Asia, nes | 156 | China | Export | 772507869 |
| 616 | 496 | Mongolia | 156 | China | Export | 1795868367 |
| 621 | 504 | Morocco | 156 | China | Export | 21256334 |
| 635 | 516 | Namibia | 156 | China | Export | 2991391 |
| 685 | 586 | Pakistan | 156 | China | Export | 314633 |
| 700 | 604 | Peru | 156 | China | Export | 8318062306 |
| 731 | 608 | Philippines | 156 | China | Export | 242323556 |
| 780 | 643 | Russian Federation | 156 | China | Export | 224055289 |
| 793 | 688 | Serbia | 156 | China | Export | 41 |
| 805 | 699 | India | 156 | China | Export | 293533475 |
| 835 | 704 | Viet Nam | 156 | China | Export | 71467 |
| 854 | 710 | South Africa | 156 | China | Export | 31830175 |
| 896 | 724 | Spain | 156 | China | Export | 746141115 |
| 980 | 792 | Turkey | 156 | China | Export | 76043743 |
| 1028 | 842 | USA | 156 | China | Export | 4248920 |
| 1069 | 894 | Zambia | 156 | China | Export | 1657677 |
| 1086 | 180 | Dem. Rep. of the Congo | 156 | China | Export | 566480615 |
| 1106 | 268 | Georgia | 156 | China | Export | 171641028 |
| 1122 | 478 | Mauritania | 156 | China | Export | 195646154 |
| 1129 | 484 | Mexico | 156 | China | Export | 2067770798 |
| 1142 | 682 | Saudi Arabia | 156 | China | Export | 135065597 |
| 636 | 516 | Namibia | 156 | China | Re-Export | 2973160 |
| 1107 | 268 | Georgia | 156 | China | Re-Export | 65397569 |
对比上面两个表格,显然各国出入口数据是普遍有出入的:
首先,进出口对象数量上就不一致:中国上报记录中显示,中国 2019 年共从 56 个国家进口铜资源;而只有 43 个国家上报了对中国出口铜资源的记录
其次,对于有些进出口双方都上报了记录的情况,双方上报的贸易金额不一致,有的甚至出入较大
这可能是有以下的原因导致的:
根据以上数据和分析,构建有向加权网络是合适的,具体如下:
以国家为节点,进出口关系为指向(出口国 -> 进口国),贸易金额为权重,构建有向加权图
对于进出口关系,忽略各个国家之间上报数据的差异,即只要在任何一条记录中出现进出口关系即双方存在贸易关系,则为对应节点添加连边
对于贸易金额,若两方上报金额不一致则取两者均值
绘制网络:
net.draw()
其中 $\Gamma_{i\_in}$ 是指向节点 $i$ 的邻居节点集,$\Gamma_{i\_out}$ 是由节点 $i$ 指向的邻居节点集;$k_j$ 是对应节点的度;$\theta \in [0,1]$ 是参数.
具体地,有向加权网络中,定义节点 $k_j$ 的度为:
其中 $w_{uv}$ 表示由节点 $u$ 指向节点 $v$ 的边的权重,若该边不存在则记为 $0$;$\lambda$ 为出入边的权重参数.
具体地上式表示,在节点 $j$ 与其邻居节点的连边中,按权重求出入连边的权重和.
定义连边概率 $P_{i_j}$ 为节点 $i$ 被其邻居节点 $j$ 选择进行连边的概率: $$ P_{i_j} = \frac{k_i}{A_j}, \ \ \ \ (j\in\Gamma_i) $$
定义有向网络中连接信息熵如下:
这里取绝对值是因为 $\theta$ 加权的原因,$P_{i_j}$ 可能大于 $1$
net.drawEntropiesBar()
对个节点的信息熵进行层次聚类,选择聚类数为 6;并为每个节点设置聚类标签,按熵值从大到小标记为 1 - 6
nodes = net.getSortedEntropies()
Es_clusters, nodes = cluster_nodes(nodes, "E", "label", 6)
show_cluster_list(nodes, "label")
| code | name | |
|---|---|---|
| label | ||
| 1 | 156 | China |
| 2 | 152 | Chile |
| 3 | 724 | Spain |
| 4 | 100,604,616,410,268,276,842,392,757,56,116,434 | Bulgaria,Peru,Poland,Rep. of Korea,Georgia,Ger... |
| 5 | 124,528,398,699,76,36,484,490,894,710,516,246,458 | Canada,Netherlands,Kazakhstan,India,Brazil,Aus... |
| 6 | 608,752,360,643,381,51,688,826,792,180,642,860... | Philippines,Sweden,Indonesia,Russian Federatio... |
选取以下属性,并选择聚类层数:
属性值由大到小标记为 1-6
nodes, attributes = set_attributes(nodes)
for attr, values in attributes.items():
cluster, nodes = cluster_nodes(
nodes, attr, attr, values["layer"])
values["cluster"] = cluster
show_nodes_attribute(nodes)
code name IS OS DC BC CC label 0 156 China 1 6 1 1 1 1 2 152 Chile 6 1 2 2 2 2 4 724 Spain 4 5 3 4 2 3 28 757 Switzerland 6 6 5 5 3 4 27 392 Japan 2 6 5 6 3 4 .. ... ... .. .. .. .. .. ... 52 40 Austria 6 6 6 6 4 6 49 642 Romania 6 6 5 5 4 6 48 643 Russian Federation 5 6 5 6 3 6 117 212 Dominica 6 6 6 6 6 6 118 50 Bangladesh 6 6 6 6 6 6 [119 rows x 8 columns]
使用 ID3 算法生成决策树
decision_tree = generate_Decision_Tree(nodes, attributes)
show_dt_accuracy(nodes, decision_tree)
{'IS': [1, 2, 3, 4, 5, 6], 'OS': [1, 2, 3, 4, 5, 6], 'BC': [1, 2, 3, 4, 5, 6], 'DC': [1, 2, 3, 4, 5, 6], 'CC': [1, 2, 3, 4, 5, 6]}
决策树的正确率: 94.9579831932773 %
根据决策树生成决策表:
decision_list = ID3.generateList(decision_tree)
pd.DataFrame(decision_list)[attribute_names + ['label']].sort_values('label')
| IS | OS | DC | BC | CC | label | |
|---|---|---|---|---|---|---|
| 0 | NaN | NaN | 1 | NaN | NaN | 1 |
| 1 | NaN | 1.0 | 2 | NaN | NaN | 2 |
| 2 | NaN | 2.0 | 2 | NaN | NaN | 2 |
| 3 | NaN | 3.0 | 2 | NaN | NaN | 2 |
| 5 | NaN | 5.0 | 2 | NaN | NaN | 2 |
| ... | ... | ... | ... | ... | ... | ... |
| 39 | 1.0 | NaN | 5 | NaN | NaN | 6 |
| 64 | 6.0 | NaN | 5 | 6.0 | 1.0 | 6 |
| 65 | 6.0 | 1.0 | 5 | 6.0 | 2.0 | 6 |
| 67 | 6.0 | 3.0 | 5 | 6.0 | 2.0 | 6 |
| 85 | NaN | NaN | 6 | 6.0 | 6.0 | 6 |
86 rows × 6 columns
其中 $A_i$ 表示条件属性,$jA_i$ 表示条件属性 $A_i$ 的属性取值. $\#jA_i$ 表示属性 $A_i$ 取 $j$ 时的样本数,$N$ 表示总样本数。
attributes = set_attribute_probability(nodes, attributes)
show_attributes_distribution(attributes)
Name Probability 0 IS [0.0084, 0.0084, 0.0084, 0.0420, 0.1008, 0.8319] 1 OS [0.0084, 0.0084, 0.0084, 0.0336, 0.0336, 0.9076] 2 BC [0.0084, 0.0168, 0.0252, 0.0336, 0.1176, 0.7983] 3 DC [0.0084, 0.0168, 0.0672, 0.0672, 0.2101, 0.6303] 4 CC [0.0084, 0.1261, 0.2773, 0.2773, 0.1008, 0.2101]
其中,$p_{jA_k}$ 表示在决策规则 $l$ 中对应的各条件属性 $A_k$ 的取值为 $j$ 的概率.
这里将各条件属性的分布近似看做为相互独立,即某条件属性的取值不受其他条件属性取值的影响
decision_list = set_decision_probability(decision_list, attributes)
show_decision_probability(decision_list)
IS OS DC BC CC p 0 NaN NaN 1 NaN NaN 0.008403 1 NaN 1.0 2 NaN NaN 0.000141 2 NaN 2.0 2 NaN NaN 0.000141 3 NaN 3.0 2 NaN NaN 0.000141 4 NaN 4.0 2 NaN NaN 0.000565 .. .. ... .. ... ... ... 81 NaN NaN 6 6.0 2.0 0.063421 82 NaN NaN 6 6.0 3.0 0.139527 83 NaN NaN 6 6.0 4.0 0.139527 84 NaN NaN 6 6.0 5.0 0.050737 85 NaN NaN 6 6.0 6.0 0.105702 [86 rows x 6 columns]
decision_probability_bar(decision_list)
/Users/hozen/Workspace/research-of-cooper-trade/src/main.py:260: UserWarning: FixedFormatter should only be used together with FixedLocator ax.set_xticklabels(x_labels, fontsize=14)
<Figure size 1440x1440 with 0 Axes>
节点脆弱性在对应离散分区下的层次风险:
$$ P_j = \sum_{l=1}^MP_{\text{Rule}l}(j_{A_n}) $$risks = get_hierarchical_risk(decision_list)
hierarchical_risk_bar(risks)